{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AlexNet\n",
    "1. 在LeNet提出后的将近20年里，神经网络一度被其他机器学习方法超越，如支持向量机。\n",
    "2. 虽然LeNet可以在早期的小数据集上取得好的成绩，但是在更大的真实数据集上的表现**并不尽如人意**。\n",
    "\n",
    "神经网络可以直接基于图像的原始像素进行分类。这种称为端到端（end-to-end）的方法节省了很多中间步骤。然而，在很长一段时间里更流行的是研究者通过勤劳与智慧所设计并生成的手工特征。这类图像分类研究的主要流程是：\n",
    "\n",
    "1. 获取图像数据集；\n",
    "2. 使用已有的特征提取函数生成图像的特征；\n",
    "3. 使用机器学习模型对图像的特征分类。\n",
    "\n",
    ">机器学习研究者 : 优雅的定理证明了许多分类器的性质。机器学习领域生机勃勃、严谨而且极其有用。\n",
    ">计算机视觉研究者，则是另外一幅景象。他们会告诉你图像识别里“不可告人”的现实是：计算机视觉流程中真正重要的是数据和特征。也就是说，使用较干净的数据集和较有效的特征甚至比机器学习模型的选择对图像分类结果的影响更大。\n",
    "\n",
    "## 手工设计  or 学习特征\n",
    "\n",
    ">在相当长的时间里，特征都是基于各式各样手工设计的函数从数据中提取的  比如 \n",
    "### SIFT\n",
    "\n",
    "\n",
    "另一些研究者则持异议。他们认为特征本身也应该由学习得来。他们还相信，为了表征足够复杂的输入，特征本身应该分级表示。\n",
    "\n",
    "持这一想法的研究者相信，多层神经网络可能可以**学得数据的多级表征，并逐级表示越来越抽象的概念或模式**。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AlexNet 的结构\n",
    "2012年，AlexNet横空出世。这个模型的名字来源于论文第一作者的姓名Alex Krizhevsky [1]。AlexNet使用了8层卷积神经网络，并以很大的优势赢得了ImageNet 2012图像识别挑战赛。它首次证明了学习到的特征可以超越手工设计的特征，从而一举打破计算机视觉研究的前状。\n",
    "\n",
    "\n",
    "AlexNet与LeNet的设计理念非常相似，但也有显著的区别。\n",
    "\n",
    "第一，与相对较小的LeNet相比，AlexNet包含8层变换，其中有5层卷积和2层全连接隐藏层，以及1个全连接输出层。下面我们来详细描述这些层的设计。\n",
    "\n",
    "AlexNet第一层中的卷积窗口形状是$11\\times11$。因为ImageNet中绝大多数图像的高和宽均比MNIST图像的高和宽大10倍以上，ImageNet图像的物体占用更多的像素，所以需要更大的卷积窗口来捕获物体。第二层中的卷积窗口形状减小到$5\\times5$，之后全采用$3\\times3$。此外，第一、第二和第五个卷积层之后都使用了窗口形状为$3\\times3$、步幅为2的最大池化层。而且，AlexNet使用的卷积通道数也大于LeNet中的卷积通道数数十倍。\n",
    "\n",
    "紧接着最后一个卷积层的是两个输出个数为4096的全连接层。这两个巨大的全连接层带来将近1 GB的模型参数。由于早期显存的限制，最早的AlexNet使用双数据流的设计使一个GPU只需要处理一半模型。幸运的是，显存在过去几年得到了长足的发展，因此通常我们不再需要这样的特别设计了。\n",
    "\n",
    "第二，AlexNet将sigmoid激活函数改成了更加简单的ReLU激活函数。一方面，ReLU激活函数的计算更简单，例如它并没有sigmoid激活函数中的求幂运算。另一方面，ReLU激活函数在不同的参数初始化方法下使模型更容易训练。这是由于当sigmoid激活函数输出极接近0或1时，这些区域的梯度几乎为0，从而造成反向传播无法继续更新部分模型参数；而ReLU激活函数在正区间的梯度恒为1。因此，若模型参数初始化不当，sigmoid函数可能在正区间得到几乎为0的梯度，从而令模型无法得到有效训练。\n",
    "\n",
    "第三，AlexNet通过丢弃法（参见3.13节）来控制全连接层的模型复杂度。而LeNet并没有使用丢弃法。\n",
    "\n",
    "第四，AlexNet引入了大量的图像增广，如翻转、裁剪和颜色变化，从而进一步扩大数据集来缓解过拟合。我们将在后面的9.1节（图像增广）详细介绍这种方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import torch\n",
    "from torch import nn, optim\n",
    "import torchvision\n",
    "\n",
    "import sys\n",
    "sys.path.append(\"..\") \n",
    "\n",
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "\n",
    "class AlexNet(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(AlexNet, self).__init__()\n",
    "        self.conv = nn.Sequential(\n",
    "            nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(3, 2), # kernel_size, stride\n",
    "            # 减小卷积窗口，使用填充为2来使得输入与输出的高和宽一致，且增大输出通道数\n",
    "            nn.Conv2d(96, 256, 5, 1, 2),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(3, 2),\n",
    "            # 连续3个卷积层，且使用更小的卷积窗口。除了最后的卷积层外，进一步增大了输出通道数。\n",
    "            # 前两个卷积层后不使用池化层来减小输入的高和宽\n",
    "            nn.Conv2d(256, 384, 3, 1, 1),\n",
    "            nn.ReLU(),\n",
    "            nn.Conv2d(384, 384, 3, 1, 1),\n",
    "            nn.ReLU(),\n",
    "            nn.Conv2d(384, 256, 3, 1, 1),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(3, 2)\n",
    "        )\n",
    "         # 这里全连接层的输出个数比LeNet中的大数倍。使用丢弃层来缓解过拟合\n",
    "        self.fc = nn.Sequential(\n",
    "            nn.Linear(256*5*5, 4096),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(0.5),\n",
    "            nn.Linear(4096, 4096),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(0.5),\n",
    "            # 输出层。由于这里使用Fashion-MNIST，所以用类别数为10，而非论文中的1000\n",
    "            nn.Linear(4096, 10),\n",
    "        )\n",
    "\n",
    "    def forward(self, img):\n",
    "        feature = self.conv(img)\n",
    "        output = self.fc(feature.view(img.shape[0], -1))\n",
    "        return output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AlexNet(\n",
      "  (conv): Sequential(\n",
      "    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))\n",
      "    (1): ReLU()\n",
      "    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))\n",
      "    (4): ReLU()\n",
      "    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (7): ReLU()\n",
      "    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (9): ReLU()\n",
      "    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (11): ReLU()\n",
      "    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (fc): Sequential(\n",
      "    (0): Linear(in_features=6400, out_features=4096, bias=True)\n",
      "    (1): ReLU()\n",
      "    (2): Dropout(p=0.5)\n",
      "    (3): Linear(in_features=4096, out_features=4096, bias=True)\n",
      "    (4): ReLU()\n",
      "    (5): Dropout(p=0.5)\n",
      "    (6): Linear(in_features=4096, out_features=10, bias=True)\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net = AlexNet()\n",
    "print(net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):\n",
    "    \"\"\"Download the fashion mnist dataset and then load into memory.\"\"\"\n",
    "    trans = []\n",
    "    if resize:\n",
    "        trans.append(torchvision.transforms.Resize(size=resize))\n",
    "    trans.append(torchvision.transforms.ToTensor())\n",
    "    \n",
    "    transform = torchvision.transforms.Compose(trans)\n",
    "    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)\n",
    "    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)\n",
    "\n",
    "    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)\n",
    "    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4)\n",
    "\n",
    "    return train_iter, test_iter\n",
    "\n",
    "batch_size = 128\n",
    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
    "train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练\n",
    "\n",
    "这时候我们可以开始训练AlexNet了。相对于LeNet，由于图片尺寸变大了而且模型变大了，所以需要更大的显存，也需要更长的训练时间了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "training on  cuda\n",
      "epoch 1, loss 0.6057, train acc 0.771, test acc 0.859, time 78.3 sec\n",
      "epoch 2, loss 0.1679, train acc 0.876, test acc 0.875, time 64.1 sec\n",
      "epoch 3, loss 0.0962, train acc 0.893, test acc 0.902, time 73.8 sec\n",
      "epoch 4, loss 0.0652, train acc 0.904, test acc 0.891, time 71.6 sec\n",
      "epoch 5, loss 0.0476, train acc 0.912, test acc 0.901, time 64.3 sec\n"
     ]
    }
   ],
   "source": [
    "import dl_utils\n",
    "import imp\n",
    "imp.reload(dl_utils)\n",
    "lr, num_epochs = 0.001, 5\n",
    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
    "dl_utils.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "- AlexNet跟LeNet结构类似，但使用了更多的卷积层和更大的参数空间来拟合大规模数据集ImageNet。它是浅层神经网络和深度神经网络的分界线。\n",
    "- 虽然看上去AlexNet的实现比LeNet的实现也就多了几行代码而已，但这个观念上的转变和真正优秀实验结果的产生令学术界付出了很多年。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
